Sufficiency of Markov Policies for Continuous-Time Jump Markov Decision Processes
نویسندگان
چکیده
One of the basic facts known for discrete-time Markov decision processes is that, if probability distribution an initial state fixed, then every policy it easy to construct a (randomized) with same marginal distributions state-action pairs as original policy. This equality implies that values major objective criteria, including expected discounted total costs and average rewards per unit time, are equal these two policies. paper investigates validity similar fact continuous-time jump (CTJMDPs). It shown in this takes place CTJMDP corresponding defines nonexplosive process. If process explosive, at each time instance, probability, pair belongs measurable set pairs, not greater described than These results applied CTJMDPs time. criteria policy, there exists or better value function.
منابع مشابه
Sufficiency of Markov Policies for Continuous-Time Markov Decision Processes and Solutions of Forward Kolmogorov Equation for Jump Markov Processes
In continuous-time Markov decision processes (CTMDPs) with Borel state and action spaces, unbounded transition rates, for an arbitrary policy, we construct a relaxed Markov policy such that the marginal distribution on the stateaction pairs at any time instant is the same for both the policies. This result implies the existence of a relaxed Markov policy that performs equally to an arbitrary po...
متن کاملContinuous time Markov decision processes
In this paper, we consider denumerable state continuous time Markov decision processes with (possibly unbounded) transition and cost rates under average criterion. We present a set of conditions and prove the existence of both average cost optimal stationary policies and a solution of the average optimality equation under the conditions. The results in this paper are applied to an admission con...
متن کاملSolving Structured Continuous-Time Markov Decision Processes
We present an approach to solving structured continuous-time Markov decision processes. We approximate the the optimal value function by a compact linear form, resulting in a linear program. The main difficulty arises from the number of constraints that grow exponentially with the number of variables in the system. We exploit the representation of continuous-time Bayesian networks (CTBNs) to de...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2022
ISSN: ['0364-765X', '1526-5471']
DOI: https://doi.org/10.1287/moor.2021.1169